MOSAIC for Multiple-Reward Environments
نویسندگان
چکیده
Reinforcement learning (RL) can provide a basic framework for autonomous robots to learn to control and maximize future cumulative rewards in complex environments. To achieve high performance, RL controllers must consider the complex external dynamics for movements and task (reward function) and optimize control commands. For example, a robot playing tennis and squash needs to cope with the different dynamics of a tennis or squash racket and such dynamic environmental factors as the wind. In addition, this robot has to tailor its tactics simultaneously under the rules of either game. This double complexity of the external dynamics and reward function sometimes becomes more complex when both the multiple dynamics and multiple reward functions switch implicitly, as in the situation of a real (multi-agent) game of tennis where one player cannot observe the intention of her opponents or her partner. The robot must consider its opponent's and its partner's unobservable behavioral goals (reward function). In this article, we address how an RL agent should be designed to handle such double complexity of dynamics and reward. We have previously proposed modular selection and identification for control (MOSAIC) to cope with nonstationary dynamics where appropriate controllers are selected and learned among many candidates based on the error of its paired dynamics predictor: the forward model. Here we extend this framework for RL and propose MOSAIC-MR architecture. It resembles MOSAIC in spirit and selects and learns an appropriate RL controller based on the RL controller's TD error using the errors of the dynamics (the forward model) and the reward predictors. Furthermore, unlike other MOSAIC variants for RL, RL controllers are not a priori paired with the fixed predictors of dynamics and rewards. The simulation results demonstrate that MOSAIC-MR outperforms other counterparts because of this flexible association ability among RL controllers, forward models, and reward predictors.
منابع مشابه
Balancing Multiple Sources of Reward in Reinforcement Learning
For many problems which would be natural for reinforcement learning, the reward signal is not a single scalar value but has multiple scalar components. Examples of such problems include agents with multiple goals and agents with multiple users. Creating a single reward value by combining the multiple components can throwaway vital information and can lead to incorrect solutions. We describe the...
متن کاملModelling Motivation as an Intrinsic Reward Signal for Reinforcement Learning Agents
Reinforcement learning agents require a learning stimulus in the form of a reward signal in order for learning to occur. Typically, this reward signal makes specific assumptions about the agent’s external environment, such as the presence of certain tasks which should be learned or the presence of a teacher to provide reward feedback. For many complex, dynamic environments, design time knowledg...
متن کاملInvestigation of The Relationships Between Reinforcement Sensitivity and Positive and Negative Emotion Dysregulation
Background and Aim: The primary purpose of this study was to investigate the relationships between reward sensitivity and punishment sensitivity and positive emotion regulation strategies and negative emotion regulation strategies among students. Materials and methods: 189 students studying at one of Tehran Universities were selected by accessible random sampling method, and then Emotion Regul...
متن کاملEfficient Reward Functions for Adaptive Multi-rover Systems
This paper addresses how efficient reward methods can be applied to multiple agents co-evolving in noisy and changing environments, under communication limitations. This problem is approached by “factoring” a global reward over all agents into agent-specific rewards that have two key properties: 1) agents maximizing their agentspecific rewards will tend to maximize the global reward, 2) an agen...
متن کاملTitle: Critical Role for the Mediodorsal Thalamus in Permitting Rapid Reward-guided Updating in Stochastic Reward Environments Running Title: Md Thalamus and Reward-guided Updating Number of Figures: 7, with 2 Figure Supplements (1 Each for Figure 2 and Figure 3)
23 Adaptive decision-making uses information gained when exploring alternative options to 24 decide whether to update the current choice strategy. Magnocellular mediodorsal 25 thalamus (MDmc) supports adaptive decision-making, but its causal contribution is not 26 well understood. Monkeys with excitotoxic MDmc damage were tested on probabilistic 27 three-choice decision-making tasks. They could...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- Neural computation
دوره 24 3 شماره
صفحات -
تاریخ انتشار 2012